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Abstract 

Alerting experience with a well-acknowledged safety analysis code 
initiated the authors to pay attention to safety issues of complex sys- 
tems. Their first concern was the statistical characteristics of such a 
code. We point out a remarkable weakness of the so called 0.95/0.95 
methodology: when repeating the search for the tolerance limit, we 
get a higher value with non-negligible probability. We propose the 
sign test as an alternative method. We point out the correct form of 
Wilks' formula when the number of parameters subjected to limitation 
is two or more. 

Keywords: safety analysis, 0.95|0.95 methodology, sign test 



1 Introduction 

Alerting experience with a well-acknowledged safety analysis code pQ, j2] 
which is widely used in the licensing process of nuclear power plants, initiated 
the authors to pay attention to safety issues of nuclear reactors. Their first 
concern was the statistical characteristics of such a code. In order to judge if 
a given nuclear reactor was safe, one had to demonstrate that safety criteria 
are met with a reasonable probability. But to judge the output of the code, 
one needed to know the probability distribution of the output. 

In a former paper jSj we discussed the handling of statistics of model 
calculations with several outputs. The present work provides a correct sta- 
tistical estimation of a quantile and we point out the inadequacy of the 
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traditional 95% probability limit approach, which seems to be the practice 
at US Nuclear Regulatory Commission. We advocate the sign test instead. 

Let us consider results of runs of a code modelling the single output 
variable, which is subjected to limitation. Let the output values be ordered: 

y{l)<y{2)<---<y{N). (1) 

We call the ensemble (PQ) a sample. Let the acceptance range be given as 
{—oo, Ut], where Ut is the technological limit for y. We assume that the 
distribution of y is unknown, and are looking for a quantile such that 

dG{y) = 7, (2) 

J — oo 

where G{y) is the unknown cumulative distribution function of output vari- 
able y. Quantile is to be derived from measured value, thus, itself is a 
random variable. 

In Section 2, we address the problem of estimating quantile Q^. Two 
solutions are mentioned: the classical Baysian solution and a recent solution, 
which is applicable to several variables. In Section 3, we present an example 
where the 0.95 — 0.95 methodology seems to fail and in Section 4, we sug- 
gest another methodology based on sign test. Our concluding remarks are 
summarized in the last Section. 



2 Estimation for one-tailed quantile 

The random interval {—oo, y{s)] covers a proportion larger than 7 of the 
unknown distribution function G{y) with probability f3 when 

P = V{y{s) > Q,}, (3) 

where V{A} denotes the probability of event A. It can be shown [4] that 

j=o V / 

When s = N, i.e. the largest element of the sample is chosen as upper limit 
of the random interval, one obtains the well-known formula: 

/3 = l-7^. (5) 
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Since one finds misinterpretations in the engineering practice it is not super- 
fluous to underline the proven notion of formula (0)- /5 is the probability that 
the largest value y{N) of a sample comprising observations is greater then 
the 7 quantile of the unknown distribution of output variable y. Another 
formulation asserts that (3 is the probability that the interval {—oo, y{N)] 
covers a larger than 7 portion of the unknown distribution G{y) of the out- 
put variable y. 

2.1 Old Bayesian method 

If we carry out A^ runs, i.e., we determine the output variable from A^ fluctuat- 
ing inputs, and deflne a flx acceptance region Ha = [Lt, Ut]- The probability 

V{y e Ha} = / g{u) du = w 

JHa 

Table 1 : Number of failures observations N ~ k at which w > u; holds with probability 
at least a 



a 


UJ 


A^- A; = 


A^- A; = 1 


A^- A; = 2 




0.90 


21 


31 


51 


0.90 


0.95 


44 


75 


104 




0.99 


228 


387 


530 




0.90 


27 


45 


60 


0.95 


0.95 


57 


92 


123 




0.99 


297 


472 


626 




0.90 


43 


63 


80 


0.99 


0.95 


89 


129 


164 




0.99 


457 


660 


836 



of the output variable y to lay in Tia is unknown. However, knowing that 
k elements out of A^ are in the acceptance interval, we can estimate the 
probability that the unknown acceptance probability w is greater than a 
prescribed u without knowing the distribution function g{u). The claim is 
based on Bayes theorem on conditional probabilities and asserts 

f3{u;\N, A:) = E ^ ^) (1 - ^)'' ' (6) 

j=o ^ ^ ^ 
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The proof is available in textbooks. Using (jH)), we can easily determine the 
allowed number of rejections in a sample of elements to make sure that 
w > u is true with a given f3 > a prescribed probability. In Tab. 1, we have 
collected a few examples to give an impression how expression (jH)) works. It 
is noteworthy that even if = 0, i.e. when all outputs are accepted, there 
is a non-zero probability that outputs will appear which should have been 
rejected. As we see, no failure out of 21 runs assures the same probability as 
one failure out of 31 runs or two failures out of 51 runs (cf. the first row of 
Tab. 1). 



2.2 Case of Several Variables 

The following statement generalizes the estimate of a quantile to several 
output variables. In the case of > 2 output variables with continuous 
joint distribution function G{yi, . . . ,yn) it is possible to construct n-pairs 
of random intervals [Lj, Uj], j = 1, . . . ,n such that the probability of the 
inequality 

/ • • • / giVi, ...,yn)dyi---dyn>-f (7) 

J Ll J Ln 

is free of giyi, . . . , ?/„) and is given by 

^ W ■ ■ ■ / giVu ■■■,yn) dyi---dyn>i \ = 



were < /? < 1 is a given number. Details and proof of the statement can 
be found in [4]. 



3 Challenge of the 0.95|0.95 methodology 

In the present section, we consider an example. We assume the single output 
variable y to have a lognormal distribution with parameters m and d. This 
will be our "unknown" G{y) distribution. The density function is 



9{y) 



yd\/2'n 



exp 



logy 



m 



(8) 



where y > 0. 
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We use Monte Carlo simulation to generate four samples of size = 100, 
in the simulation we take m = 2.5 and d = 0.5. The goal is to get point 
estimates of 0.95-quantiles for each sample and to determine the shortest two- 
tailed confidence intervals which cover with 0.95 probability the "unknown" 
quantile Qo.95, the reference value is Qo.95 ~ 27.73. The four samples are 
labeled as A, B, C, and D, the results of the simulation are summarized in 
Tab. 2. 

Table 2: Confidence intervals [y{r),y{s)] covering the quantile Qo.95- 





A 


B 


C 


D 


y{r) 


22.66 


25.21 


22.48 


23.29 


Qo.95 


27.73 


27.73 


27.73 


27.73 


vis) 


33.25 


38.28 


35.88 


53.05 


(r, s) 


(91, 100) 


(91, 100) 


(91, 100) 


(91, 100) 



If the upper limit, determined by the technology would be Ut=40, then, 
cases A,B, and C could be considered only as safe. 

Setting j3 = 0.95 and 7 = 0.95, from Eq. (0) we get the sample size A^ = 
58, i.e. the largest element of a sample having 58 elements ^ should be chosen 
as Qo.95- We performed the following numerical experiment: Generated a 
sample of 58 elements, that sample is called basic sample, in notation: y^^\ 
Then, we repeat the sample generation n = 1000 times, thus obtaining the 
samples y^^\y^'^\ . . . , The largest elements of those samples can be 

seen in Fig. ^ The minimum of the values is 22.62, the largest value is 
132.27. One can observe that in 224 samples (more than 22% of the one 
thousand samples) the maximum exceeds the maximum of the basic sample 
(l/*-''^(58) = 45). Let us check whether that number is reliable or not. 

The probability that the largest element in a given sample is greater than 
Q^ is 1 — 7^. Let in^Q-y) stand for the random variable giving the number of 
maximum elements exceeding Q^. The probability distribution of the newly 
introduced random variable is 

V{UQ,) = k}= Q (1 - 7"^)'= t"^^"-'^)- (9) 
^In the practice = 59 is used. 
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200 400 600 800 1000 
number of sample 

Figure 1: Results of of 1000 samples of size N = 58. The largest element of the basic 
sample is y'^''\58) « 45. 

Prom this expression we obtain the expectation value and the variance as 

mn{Q,)} = n{l-^''), (10) 

D^{UQ7)} = ^7"^(l-7"^)- (11) 
When n and k are sufficiently large, the distribution of the random variable 

UQ-y)-MUQ,)} 



is approximately standard normal, hence. 



(12) 



is valid with probability w and A is the root of 

^ 7-00 2 



(13) 



(14) 



Substituting here n = 1000, N = 58 and w = 0.95, we get E{{„(Q^)} = 
950, D{^n{Qy)} ~ 6.96, A ^ 1.96, and the following relationship is fulfilled 
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with probability 95%: 936 < ^iooo(Qo.95) < 964. We can not estimate the 
number of samples, in each of which the maximum exceeds the maximum of 
the basic sample but we can count the number of maximal values exceed- 
ing the known quantile Q0.95, that number is 949, a number witnessing the 
correctness of the statistics. 

In spite of the nice agreement we wish to underline that the (0.95 1 0.95) 
safety policy does not exclude rare events such as limit violation when some 
of the calculated values are over the limit Ut- 

Another conclusion is that the maximal element of a single sam- 
ple of 58 elements would be y*^''^(58) and if we repeat the sampling 
several times, then in relatively Icirge number of the samples we 
get a higher than y*^'')(58) value for the maximal element. In the 
light of this experience one asks: is this the intended outcome of 
the 0.95|0.95 methodology? It is clear that a larger safety margin 
is needed to compensate for the weaikness of the 0.95 1 0.95 method- 
ology. 

One must mention here that the result found in the above presented exam- 
ple is not exceptional but it is a direct consequence of a well-known theorem 
of mathematical statistics. It is easy to show that if one repeats the sampling 
from any continuous distribution (n + l) times independently, then the prob- 
ability that at least k out of n maximal sample elements y'^^^N), . . . , y^"'\N) 
will exceed the initial (basic) sample value y^^\N), is equal to 1 — /c/(n + 1). 
The proof of the theorem and two important remarks are given in the Ap- 
pendix. 



The concluding remarks at the end of the previous section are not optimistic. 
The question is whether one can find a method more suitable for checking, 
from a computer model, the safety of a large device? Below we propose such 
a method based on sign test. 

Again, we assume the cumulative distribution function G{y) of the output 
variable to be continuous but unknown. Let Sn — {yi, ■ ■ ■ , Z/at} be a sample 
of N observations (runs of a computer model). Define the function 



4 Method based on sign test 




0, iix< 0, 



1, ii X > 0, 



(15) 
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and introduce the statistical function 



N 

ZN = J2A{UT-y,) (16) 



which gives the number of sample elements smaller than Ut- Criteria based 
on statistical function (16) are called sign criterion since counts the 
positive Ut — yj differences. When G{y) is continuous, the probability of 
Ut — y = is zero. 

Obviously, distribution of zn is binomial, using the notation 

V{A{Ut -y) = l} = V{y < Ut} = p, (17) 

we obtain 

V{ZM = j} = p' (1 - pf-', J = 0, 1, . . . , AT. (18) 

Our task is to find a confidence interval [jilk), juik)] that covers the 
value p with a prescribed probability (3 provided we have a sample of size 
and in that sample zn = k < N . The probability (fT7|) gives the probability 
that the output y is not larger than the technological limit Ut- When the 
lower level 7l(^) of the confidence interval is close to unity, we can claim at 
least with probability j3 that the chance of finding the output y smaller than 
Ut is also close to unity and the system under consideration can be regarded 
as safe at the level [/3|7l(/c)]. 

If the sample size iV > 50, the random variable 

^-^^ a (19) 



^Np {l-p) 

has approximately normal distribution. Here k is the number of sample 
elements not exceeding Ut- Let (3 denote the confidence level, then 

mCfel <^/3} = 2$(m^)-1 = /3, 

where $(x) is the standard normal distribution function. This equation can 
be rewritten in the form 

vm < up} = vm + u^)ip - ^l){p - lu) < 0} = /3, 
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where 



IL = lL{k,Up) = 



N + u} 



(20) 



and 



lu = lu{k,Uf3) 



N + u} 



(21) 



Here is the root of 



«i>M = ^(i + /?). 



In a number of cases it suffices to know the probabihty of the event 
{72,(/c, f/j) < p}. Since Ck with k fixed is a decreasing function of p, the 
events {C,k < vp} and {^lik-jVp) < p} are equivalent, hence 

V{Ck < vp} = V{jL{k,V(s) <p} = <l>ivp) = (3. 

Consequently, the operation of a system can be regarded safe if the param- 
eter p for all output variables is covered by [7i(/c, t>^), 1] with a prescribed 
probability 13, provided that ■yL{k,Vf3) is close to unity. 



Table 3: Number of successes A: in a sample of size A'' 



k 


99 


108 


118 


128 


137 


147 


157 


166 


176 


185 


195 


N 


100 


110 


120 


130 


140 


150 


160 


170 


180 


190 


200 



Table 3 gives the number of successes in a sample of size N needed for 
acceptance at the level /9 = 7l = 0.95. We utilized approximate formula 
to derive the entries in Tab. 3. 

When the sample size is less than 50, we may not apply the asymptotically 
valid normal distribution. The below given derivation of the confidence limits 
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is a modified method proposed by Clopper and Pearson jH] . The probabihty 
of at least k successes from observations is given by 



n \ J / 



N-j 



j=0 

where p = V{y < Ut}- This formula can be recast as 

and it is clear from that expression that S^^^ (p) is a monotonously decreasing 
function of p. Since 

[ 0, ifp=l, 

it assumes an arbitrary value only once in the interval [0,1]. Consequently, a 
P = P5 value exists so that 

Sl''\p5)=6, V0<5<1. 

Exploiting the monotony, we can construct a function such that 

when p > ps- Such a function is 

N 



i?r(p)=i-^S(p)=E(^)^(i-^) 



N-j 



j=k 

Finally, we establish the upper limit 7^7 from 

Sl''\lu)<l{l-f3), 
and the lower limit 7^ from 

4''^(7l)<^(1-/?). 

The interval [jl, ju] covers the unknown parameter p with probability f5. 
The dependence of 'Jl and 7^7 are shown in Fig. |21for a sample of = 100 
elements, d stands for confidence level (3. 
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Figure 2: Dependence of 7^ and on the number of successes in a sample of N=100 
elements. 



4.1 Several output variables 

Now we assume the output to comprise n variables. Let these variables be 
yi, . ■ ■ ,yn- There are several fairly good tests to prove if they are statisti- 
cally independent. To independent variables we can apply the considerations 
above but for dependent variables we need novel considerations. Let 



'>N 



( yii yi2 
\ ym yn2 



yiN \ 
y2N 

ynN j 



denote the sample matrix obtained in >> 2r;, independent observations. 
With a computer model, an observation is a run. Introducing the column 
vector ^fc, the sample matrix is written as 



'TV 



ym 



Below we expound the sign test for two output variables y\ and relying 
on the assumption that their joint distribution function G{yi,y2) is unknown 
but continuous in either variable. The goal of the foregoing analysis is to 
verify the safety conditions yi < U^^ and 7/2 < 



r(2) 



is accomplished with probability pi2 = G{U^\u^''^ 



When the condition 
, ; ~ 1 we say the sys- 
tem is safe. Here, as before, the limits and f/^^^ are determined by 
the technology. Since pi2 is unknown, our job is to construct a confidence 
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interval [7i'^\ 7c/ '^^] so that it covers pi2 with probabihty (3i2- In most 
cases it suffices to calculate solely 'y^''^^ and to use the interval [7^,^'^^ 1] 
as confidence interval. Now the column vectors introduced above have two 
components. In accordance with our assumption, different vectors are statis- 
tically independent but the components in a given vector are not necessarily 
independent. In order to keep the notation as simple as possible, the event 
{yi < 1/2 < U^^} will be called a success. If yik < U^^ and y2k < 
then 

while otherwise, and introduce the statistical function 

N 

4'^) = ^{U? - yik) A(4') - y2k) 

k=l 

which gives the number of successes in the sample of size A^. Since the newly 
introduced random variable is the sum of N independent random variables, 
assuming values either 1 or 0, its distribution is binomial. Using the notation 

V{A{U? - yi) A([/f - y2) = 1} = 

we can write 

ioT k — 0,1, . . . , N. At this point we rejoin the thought of line of the previ- 
ous subsection. Instead of repeating the already familiar argumentation, we 
amend two trivial although important remarks. Let us define the following 
two statistical functions: 

N 

^« = ^A(t/«-2/i,) 

i=l 

and 

N 
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These two functions are not statistically independent, either one is the sum 
of independent random variables with values 1 or 0, therefore, one can 
write 

n4^ = ^} = (^)pi(i-Pi)^-^ 

and 

i,j^l,...,N, 

where 

Pe = Hye < U?} = - ye) = 1), 

£=1,2, 

are unknown probabilities. Applying the method used previously, this time 
separately to the samples 

and 

sS^ = {y2j, j = l,...,N} 

we construct two random intervals [^y^j^^ , 1] and [7^^'* , 1] covering pi and p2 
with probabilities f3i and [32, respectively. 

Obviously, it could occur that the levels (Pil'j^j^^) and {(32\'^l^) corrob- 
orate the claim that samples and separately comply with safety 
requirements. This does not mean that we would arrive at the same con- 
clusion from analyzing the two sets jointly. The reason is that yi and y2, 
the two output random variables are not statistically independent. Hence, 
we should ascertain weather the interval [7)^' , 1] covers the probability pu 

with the pre-assigned probability Pu. Since 7]^^'^'' < min{7j^^'*, 7^^'*}, 'j^]^^ and 
(2) 

7}^ would not contain information sufficient to declare the system safe. De- 
cision on the safety, when two output variables are subjected to limitations 
should go as follows. Firstly, we test the hypothesis concerning dependence 
of the output variables yi and 1/2 • If they arc dependent, we should esti- 
mate the probability of the event (yi < 2/2 < U^^}. Solely if they 
are statistically independent should we estimate the probabihty of events 
{yi < U^^}, {y2 < U^^} independently. 
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Finally, we mention that the generalization of the sign test to n > 2 
output variables is straightforward, we have to use the statistical function 



N n 



,(l,...,n) 



^N-'- = Y.Il^(^T-y,k) (22) 

k=l j=l 

to evaluate safety based on observation of N samples of the n output vari- 
ables. In this manner we obtain the sum of independent random variables 
in expression (j^^ . and then, the further steps will be the same as at the 
beginning of the subsection. 
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Figure 3: Sample a) 
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Figure 4: Sample b) 



An example is given below. We have generated two samples a) and b) 
using Monte Carlo simulation, either sample contains = 100 observations 
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(or runs) of two output variables. The samples have been generated from a 
bivariate normal distribution with parameters mi = m2 = and (Xi = (T2 = 1 
but the correlation coefficient is C = 0.1, C = 0.7 in sample a) and b), 
respectively. The acceptance range is [—2, 2] for both output variables. In 
sample a) and b) four and one samples lie respectively outside the acceptance 
range. The results of the simulation can be seen in Fig. |21 and in Fig. |^ 

Table 4: Lower confidence limits in a sample of iV = 100, k is the number of success. 



k\^ 


0.90 


0.95 


0.99 


90 


0.8501 


0.8362 


0.8086 


91 


0.8616 


0.8482 


0.8212 


92 


0.8733 


0.8602 


0.8340 


93 


0.8850 


0.8725 


0.8471 


94 


0.8970 


0.8850 


0.8604 


95 


0.9092 


0.8977 


0.8741 


96 


0.9216 


0.9108 


0.8882 


97 


0.9344 


0.9242 


0.9030 


98 


0.9476 


0.9383 


0.9185 


99 


0.9616 


0.9534 


0.9354 


100 


0.9772 


0.9704 


0.9549 



First let us consider sample a). From Tab. 4 one can read that the 
interval [0.9108,1] covers the parameter pi2 with probability /3i2=0.95. 

When we assess the output variables one by one, we see that the as- 
sociated probabilities pi and p2 are covered by the interval [0.9383, 1] with 
probability [3 = 0.95 in either sample. However tempting is to use 0.9383 as 
lower bound for the probability to be used in safety analysis, that number 
has nothing to do with pi2 and should not be used in safety analysis. 

Now let us pass on to sample b) where we see a strong correlation between 
yi and 7/2- From Tab. 4 one can read that the confidence interval [0.9383, 1] 
covers the probability j3i2 = 0.95. From that sample we conclude that the 
probability of the event {yi < u!j^\ y2 < u!^^} is at least 0.9383. The single 
variable parameters pi and p2 determined from sample b) are covered by 
the intervals [0.9534,1] and [0.9383,1], respectively on the level Pi = P2 = 
0.95. Again, however favorable these numbers are, they should not be used 
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in assessing safety. The above discussed simple numerical example clearly 
indicated the danger awaiting the analyst when his/her judgment is based 
on tests performed separately on correlated output variables. 

5 Concluding remarks 

The authors have investigated the statistical methods applied to safety analy- 
sis of nuclear reactors and arrived at alarming conclusions: Guba and Trosztel 
[7j carried out a series of calculations with the generally appreciated safety 
code ATHLET to ascertain the stability of the results against input uncer- 
tainties in a simple experimental situation. Scrutinizing those calculations, 
we came to the conclusion [3] that the ATHLET results may exhibit irregular 
behavior. A further conclusion is that the technological limits are incorrectly 
set [5] when the output variables are correlated. Another formerly unnoticed 
conclusion of the Guba- Trosztel calculations [7j is that certain innocent look- 
ing parameters (like wall roughness factor, the number of bubbles per unit 
volume, the number of droplets per unit volume) can influence considerably 
such output parameters as water levels. The authors are concerned with the 
statistical foundation of present day safety analysis practices and can only 
hope that their own misjudgment will be dispelled. 

Until then, the authors suggest applying correct statistical methods in 
safety analysis even if it makes the analysis more expensive. It would be 
desirable to continue exploring the role of internal parameters (wall rough- 
ness factor, steam- water surface in thermal hydraulics codes, homogenization 
methods in neutronics codes) in system safety codes and to study their effects 
on the analysis. 

In the validation and verification process of a code one carries out a 
series of computations. The input data are not precisely determined because 
measured data have an error, calculated data are often obtained from a more 
or less accurate model. Some users of large codes are content with comparing 
the nominal output obtained from the nominal input, whereas all the possible 
inputs should be taken into account when judging safety. At the same time, 
any statement concerning safety must be aleatory, and its merit can be judged 
only when the probability is known with which the statement is true. In some 
cases statistical aspects of safety are misused as in where the number of 
runs for several outputs is correct only for statistically independent outputs, 
or misinterpreted as in 0. 
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We do not know the probability distribution of the output variables sub- 
jected to safety limitations. At the same time in some asymmetric distribu- 
tions the 0.95|0.95 methodology simply fails: if we repeat the calculations in 
many cases we would get a value higher than the basic value, which means 
the limit violation in the calculation becomes more and more probable in the 
repeated analysis. 

Consequent application of order statistics or the application of the sign 
test may offer a way out of the present situation. The authors are also 
convinced that efforts should be made 

• to study the statistics of the output variables, 

• to study the occurrence of chaos in the analyzed cases. 

All these observations should influence, in safety analysis, the applica- 
tion of best estimate methods, and underline the opinion that any realistic 
modelling and simulation of complex systems must include the probabilistic 
features of the system and the environment. 



Appendix 

Let 77 be a random variable with continuous distribution defined over the real 
numbers TZ, and let the distribution function of 77 be 

r{v<y} = G{y). (23) 

We carry out N statistically independent observations of 1]. That operation 
is called /C. We repeat JC n+1 times. We group the observed values into the 
following (n -|- 1) X A^" matrix: 



^701 


V02 ■ ■ 


■ Von 


mi 


r]i2 ■ ■ 


■ VlN 


Vnl 


rin2 • ■ 


■ VnN 



(24) 



Let denote Q — maxi<k< n Vjk the maximum observed in operation j. 

Lemma. Since the probability density function G{y) is monotonously 
increasing, and continuous, the following equation holds for < 7 < 1: 

V { max^r^,. > ^-^(7)} = ^ { dGiy) > 7| = 1 - 7^, (25) 



17 



where G~^{'y) — is the 7 quantile of the probabihty density distribution 
function G{y). 

The presented Lemma is well known, we omit its proof. Now we turn 
to the determination of the probability distribution of the largest sample 
elements. 

Theorem.The probability of the event that among the independent ran- 
dom variables (i, . . . Xn there is k < N greater than (^q is 

Pfe = 1 —. (26) 

n+1 ^ ' 

Proof. Since 77^^, j = 0, 1, . . . , n; A; = 1, . . . , A?" are independent and identically 
distributed, we have 

J fc=i 

In other words, H{z) is the probabihty of Q not being larger than 2; e 7?. for 
any j' = 0, 1, . . . , n. Let < fn^z) < n denote the number of those variables 
from among Ci, . . . , C„ which are greater than z. Obviously, 



V {u^{z) =i} = Jt\z) = (1 - H{z)Y {H{ 



^))(n-^)_ ^28) 



Let Pk stand for the probability that from among the random variables 
Ci, . . . , Cn at least k < n is greater than (^q, which may take any number 
from TZ. We get 



hoo 

An) 



P^-Y.P^ = Y. jt\^)dH{z). (29) 



e.=k e=k 

The determination of probabilities is straightforward: 

»+oo 



jt\z)dH{z) = Q £^ (1 - H{z)Y {H{z)r-'dH{z). (30) 



The integrals are evaluated without difficulties: 



P.= l J / = — . (31) 



18 



As we see, pi is independent of d and using Eq. (|^. we get 



+ 1 n + 1 n + l ^ ' 

l=k 

Q.E.D. 

We add two remarks. 

1. Remark 1. Whichever we choose from among the random variables 
Co, Ci) • ■ ■ ? Cn, with probabihty we find among the others ^ exceed- 
ing the first chosen one. (Since QqXi^ ■ ■ ■ Xn are continuous random 
variables, the probability of two values to be identical is zero.) 

2. Remark 2. Let A be the number of those Cji, Cj2, • • • , Cjn variables 
which are greater than a given Qq. Clearly, A is a random variable, its 
expectation value is 

n 

E{A} = 5^£p, = |, (33) 

1=0 

the variance being 

{A} = E - nl2fp, = \n(l + \^ ■ (34) 
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